Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch Text Corpora
نویسندگان
چکیده
The Nederlab project aims to bring together all digitized texts relevant to the Dutch national heritage, the history of the Dutch language and culture (circa 800 – present) in one user friendly and tool enriched open access web interface. This paper describes Nederlab halfway through the project period and discusses the collections incorporated, back-office processes, system back-end as well as the Nederlab Research Portal end-user web application.
منابع مشابه
Synergy of Nederlab and @PhilosTEI: diachronic and multilingual Text-Induced Corpus Clean-up
In two concurrent projects in the Netherlands we are further developing TICCL or Text-Induced Corpus Clean-up. In project Nederlab TICCL is set to work on diachronic Dutch text. To this end it has been equipped with the largest diachronic lexicon and a historical name list developed at the Institute for Dutch Lexicology or INL. In project @PhilosTEI TICCL will be set to work on a fair range of ...
متن کاملTowards a Better Exploitation of the Brown 'Family' Corpora in Diachronic Studies of British and American English Language Varieties
Since the 1990s, the Brown ‘family’ corpora have been widely used for various diachronic studies of 20th century English language. However, the existing methodologies failed to exploit its full potential as they only used the four main text categories. In this paper, we present the results of two experiments on diachronic changes of the Coleman-Liau readability Index (CLI) in British and Americ...
متن کاملThe Integrated Language Database of 8th - 21st-Century Dutch
The Institute for Dutch Lexicology (INL) has a long-standing tradition in corpus-based lexicography. The results include electronic scholarly dictionaries of Dutch covering the vocabulary from 1200 up to 1976, linguistically annotated electronic text corpora of historical and present-day Dutch, and computational lexica. Added value to these data is given in an on-going long-term INL project, th...
متن کاملHow Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...
متن کاملQuantitative approaches to diachronic corpus linguistics
English Historical Linguistics has a rich and long-standing tradition of corpus-based work (cf. the surveys in Rissanen 2008, Kytö 2012). Resources such as the HELSINKI corpus, the BROWN family of corpora, and ARCHER have spawned active research programs for the study of lexical and grammatical change, both long-term (Curzan 2008) and short-term (Mair 2008). In addition, corpus resources inform...
متن کامل